Goto

Collaborating Authors

 slam method


Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM

Sun, Zhicong, Lo, Jacqueline, Hu, Jinxing

arXiv.org Artificial Intelligence

Simultaneous localization and mapping (SLAM) technology has recently achieved photorealistic mapping capabilities thanks to the real-time, high-fidelity rendering enabled by 3D Gaussian Splatting (3DGS). However, due to the static representation of scenes, current 3DGS-based SLAM encounters issues with pose drift and failure to reconstruct accurate maps in dynamic environments. To address this problem, we present D4DGS-SLAM, the first SLAM method based on 4DGS map representation for dynamic environments. By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes. Utilizing the dynamics-aware InfoModule, we can obtain the dynamics, visibility, and reliability of scene points, and filter out unstable dynamic points for tracking accordingly. When optimizing Gaussian points, we apply different isotropic regularization terms to Gaussians with varying dynamic characteristics. Experimental results on real-world dynamic scene datasets demonstrate that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.


DyPho-SLAM : Real-time Photorealistic SLAM in Dynamic Environments

Liu, Yi, Fan, Keyu, Lan, Bin, Liu, Houde

arXiv.org Artificial Intelligence

Visual SLAM algorithms have been enhanced through the exploration of Gaussian Splatting representations, particularly in generating high-fidelity dense maps. While existing methods perform reliably in static environments, they often encounter camera tracking drift and fuzzy mapping when dealing with the disturbances caused by moving objects. This paper presents DyPho-SLAM, a real-time, resource-efficient visual SLAM system designed to address the challenges of localization and photorealistic mapping in environments with dynamic objects. Specifically, the proposed system integrates prior image information to generate refined masks, effectively minimizing noise from mask misjudgment. Additionally, to enhance constraints for optimization after removing dynamic obstacles, we devise adaptive feature extraction strategies significantly improving the system's resilience. Experiments conducted on publicly dynamic RGB-D datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense map reconstruction, while operating in real-time in dynamic scenes.


Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting

Li, Boying, Hao, Vuong Chi, Stuckey, Peter J., Reid, Ian, Rezatofighi, Hamid

arXiv.org Artificial Intelligence

We propose Hier-SLAM++, a comprehensive Neuro-Symbolic semantic 3D Gaussian Splatting SLAM method with both RGB-D and monocular input featuring an advanced hierarchical categorical representation, which enables accurate pose estimation as well as global 3D semantic mapping. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making scene understanding particularly challenging and costly. To address this problem, we introduce a novel and general hierarchical representation that encodes both semantic and geometric information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs) as well as the 3D generative model. By utilizing the proposed hierarchical tree structure, semantic information is symbolically represented and learned in an end-to-end manner. We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Additionally, we propose an improved SLAM system to support both RGB-D and monocular inputs using a feed-forward model. To the best of our knowledge, this is the first semantic monocular Gaussian Splatting SLAM system, significantly reducing sensor requirements for 3D semantic understanding and broadening the applicability of semantic Gaussian SLAM system. We conduct experiments on both synthetic and real-world datasets, demonstrating superior or on-par performance with state-of-the-art NeRF-based and Gaussian-based SLAM systems, while significantly reducing storage and training time requirements.


VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes

Wu, Ke, Zhang, Zicheng, Tie, Muer, Ai, Ziqing, Gan, Zhongxue, Ding, Wenchao

arXiv.org Artificial Intelligence

VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2D Gaussian map. Key components of the 2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS) capabilities of Gaussian Splatting for loop closure detection and correction of the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our framework can generate high-quality Gaussian maps in real time using only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge, VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in outdoor environments and supporting kilometer-scale large scenes.


NeRF and Gaussian Splatting SLAM in the Wild

Schmidt, Fabian, Enzweiler, Markus, Valada, Abhinav

arXiv.org Artificial Intelligence

Navigating outdoor environments with visual Simultaneous Localization and Mapping (SLAM) systems poses significant challenges due to dynamic scenes, lighting variations, and seasonal changes, requiring robust solutions. While traditional SLAM methods struggle with adaptability, deep learning-based approaches and emerging neural radiance fields as well as Gaussian Splatting-based SLAM methods, offer promising alternatives. However, these methods have primarily been evaluated in controlled indoor environments with stable conditions, leaving a gap in understanding their performance in unstructured and variable outdoor settings. This study addresses this gap by evaluating these methods in natural outdoor environments, focusing on camera tracking accuracy, robustness to environmental factors, and computational efficiency, highlighting distinct trade-offs. Extensive evaluations demonstrate that neural SLAM methods achieve superior robustness, particularly under challenging conditions such as low light, but at a high computational cost. At the same time, traditional methods perform the best across seasons but are highly sensitive to variations in lighting conditions. The code of the benchmark is publicly available at https://github.com/iis-esslingen/nerf-3dgs-benchmark.


Lost in Tracking Translation: A Comprehensive Analysis of Visual SLAM in Human-Centered XR and IoT Ecosystems

Chandio, Yasra, Selialia, Khotso, DeGol, Joseph, Garcia, Luis, Anwar, Fatima M.

arXiv.org Artificial Intelligence

Advancements in tracking algorithms have empowered nascent applications across various domains, from steering autonomous vehicles to guiding robots to enhancing augmented reality experiences for users. However, these algorithms are application-specific and do not work across applications with different types of motion; even a tracking algorithm designed for a given application does not work in scenarios deviating from highly standard conditions. For example, a tracking algorithm designed for robot navigation inside a building will not work for tracking the same robot in an outdoor environment. To demonstrate this problem, we evaluate the performance of the state-of-the-art tracking methods across various applications and scenarios. To inform our analysis, we first categorize algorithmic, environmental, and locomotion-related challenges faced by tracking algorithms. We quantitatively evaluate the performance using multiple tracking algorithms and representative datasets for a wide range of Internet of Things (IoT) and Extended Reality (XR) applications, including autonomous vehicles, drones, and humans. Our analysis shows that no tracking algorithm works across different applications and scenarios within applications. Ultimately, using the insights generated from our analysis, we discuss multiple approaches to improving the tracking performance using input data characterization, leveraging intermediate information, and output evaluation.


Task-driven SLAM Benchmarking

Du, Yanwei, Feng, Shiyu, Cort, Carlton G., Vela, Patricio A.

arXiv.org Artificial Intelligence

For assistive robots, one critical use case of SLAM is to support localization as they navigate through an environment completing tasks. Current SLAM benchmarks do not consider task-based deployments where repeatability (precision) is more critical than accuracy. To address this gap, we propose a task-driven benchmarking framework for evaluating SLAM methods. The framework accounts for SLAM's mapping capabilities, employs precision as a key metric, and has low resource requirements to implement. Testing of state-of-the-art SLAM methods in both simulated and real-world scenarios provides insights into the performance properties of modern SLAM solutions. In particular, it shows that passive stereo SLAM operates at a level of precision comparable to LiDAR-based SLAM in typical indoor environments. The benchmarking approach offers a more relevant and accurate assessment of SLAM performance in task-driven applications.


CARLA-Loc: Synthetic SLAM Dataset with Full-stack Sensor Setup in Challenging Weather and Dynamic Environments

Han, Yuhang, Liu, Zhengtao, Sun, Shuo, Li, Dongen, Sun, Jiawei, Hong, Ziye, Ang, Marcelo H. Jr

arXiv.org Artificial Intelligence

The robustness of SLAM algorithms in challenging environmental conditions is crucial for autonomous driving, but the impact of these conditions are unknown while given the difficulty of arbitrarily changing the relevant environmental parameters of the same environment in the real world. Therefore, we propose CARLA-Loc, a synthetic dataset of challenging and dynamic environments built on CARLA simulator. We integrate multiple sensors into the dataset with strict calibration, synchronization and precise timestamping. 7 maps and 42 sequences are posed in our dataset with different dynamic levels and weather conditions. Objects in both stereo images and point clouds are well-segmented with their class labels. We evaluate 5 visual-based and 4 LiDAR-based approaches on varies sequences and analyze the effect of challenging environmental factors on the localization accuracy, showing the applicability of proposed dataset for validating SLAM algorithms.


The SLAM Hive Benchmarking Suite

Yang, Yuanyuan, Xu, Bowen, Li, Yinjie, Schwertfeger, Sören

arXiv.org Artificial Intelligence

Benchmarking Simultaneous Localization and Mapping (SLAM) algorithms is important to scientists and users of robotic systems alike. But through their many configuration options in hardware and software, SLAM systems feature a vast parameter space that scientists up to now were not able to explore. The proposed SLAM Hive Benchmarking Suite is able to analyze SLAM algorithms in 1000's of mapping runs, through its utilization of container technology and deployment in a cluster. This paper presents the architecture and open source implementation of SLAM Hive and compares it to existing efforts on SLAM evaluation. Furthermore, we highlight the function of SLAM Hive by exploring some open source algorithms on public datasets in terms of accuracy. We compare the algorithms against each other and evaluate how parameters effect not only accuracy but also CPU and memory usage. Through this we show that SLAM Hive can become an essential tool for proper comparisons and evaluations of SLAM algorithms and thus drive the scientific development in the research on SLAM.


Evaluation of Lidar-based 3D SLAM algorithms in SubT environment

Koval, Anton, Kanellakis, Christoforos, Nikolakopoulos, George

arXiv.org Artificial Intelligence

Autonomous navigation of robots in harsh and GPS denied subterranean (SubT) environments with lack of natural or poor illumination is a challenging task that fosters the development of algorithms for pose estimation and mapping. Inspired by the need for real-life deployment of autonomous robots in such environments, this article presents an experimental comparative study of 3D SLAM algorithms. The study focuses on state-of-the-art Lidar SLAM algorithms with open-source implementation that are i) lidar-only like BLAM, LOAM, A-LOAM, ISC-LOAM and hdl graph slam, or ii) lidar-inertial like LeGO-LOAM, Cartographer, LIO-mapping and LIO-SAM. The evaluation of the methods is performed based on a dataset collected from the Boston Dynamics Spot robot equipped with 3D lidar Velodyne Puck Lite and IMU Vectornav VN-100, during a mission in an underground tunnel. In the evaluation process poses and 3D tunnel reconstructions from SLAM algorithms are compared against each other to find methods with most solid performance in terms of pose accuracy and map quality.